Search CORE

111 research outputs found

Multi-task Deep Neural Networks in Automated Protein Function Prediction

Author: Atalay Mehmet Volkan
Cetin-Atalay Rengul
Doğan Tunca
Martin Maria Jesus
Rifaioglu Ahmet Sureyya
Publication venue
Publication date: 01/05/2017
Field of study

In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.Comment: 19 pages, 4 figures, 4 table

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks

Author: Atalay Mehmet Volkan
Atalay Rengül
Dogan Tunca
Martin Maria Jesus
Rifaioğlu Ahmet Süreyya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/05/2019
Field of study

Automated protein function prediction is critical for the annotation of uncharacterized protein sequences, where accurate prediction methods are still required. Recently, deep learning based methods have outperformed conventional algorithms in computer vision and natural language processing due to the prevention of overfitting and efficient training. Here, we propose DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, as a solution to Gene Ontology (GO) based protein function prediction. DEEPred was optimized through rigorous hyper-parameter tests, and benchmarked using three types of protein descriptors, training datasets with varying sizes and GO terms form different levels. Furthermore, in order to explore how training with larger but potentially noisy data would change the performance, electronically made GO annotations were also included in the training process. The overall predictive performance of DEEPred was assessed using CAFA2 and CAFA3 challenge datasets, in comparison with the state-of-the-art protein function prediction methods. Finally, we evaluated selected novel annotations produced by DEEPred with a literature-based case study considering the 'biofilm formation process' in Pseudomonas aeruginosa. This study reports that deep learning algorithms have significant potential in protein function prediction; particularly when the source data is large. The neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations. The source code and all datasets used in this study are available at: https://github.com/cansyl/DEEPred

OpenMETU (Middle East Technical University)

Crowdsourced mapping of unexplored target space of kinase inhibitors

Author: Aittokallio Tero
Alexopoulos Leonidas
Allaway Robert J.
Atalay Mehmet Volkan
ATALAY RENGÜL
Atas Heval
Bachmann Ivo
Barel Gal
Ben Guebila Marouen
Boeckaerts Dimitri
Briers Yves
Capuzzi Stephen
Chang Buru
Chen Jhih-Yu
Chen Tsai-Min
Cichonska Anna
Dang Kristen
De Baets Bernard
DOĞAN TUNCA
Drewry David H.
Fang Wei-Quan
Fotis Christos
Ganzlin Julia
Guinney Justin
Herwig Ralf
Hu Hailin
Huang Chih-Han
Hunyady Laszlo
Hwang Ming-Jing
Isayev Olexandr
Jeon Minji
Kanev Georgi K.
Kang Jaewoo
Karimi Mostafa
Kim Bumsoo
Kim Sunkyu
Kooistra Albert J.
Koytiger Gregory
Lamb Andrew
Lee Junhyun
Li Shuya
Lienhard Matthias
Lim Hansaim
Lucic Bono
Luo Yunan
Martin Maria J.
Mason Michael
Misak Adam
Nguyen Thin
Ntagiantas Konstantinos
Oprea Tudor I.
Orsolic Davor
Ozturk Hakime
Park Sungjoon
Peng Jian
Popova Mariya
Prasse Paul
Ravikumar Balaguru
RİFAİOĞLU AHMET SÜREYYA
Schlessinger Avner
Shamsaei Behrouz
Shen Yang
Shih Edward S. C.
Singh Sourav
Smuc Tomislav
Stepanic Visnja
Stock Michiel
Stolovitzky Gustavo
Szalai Bence
TAN MEHMET
Tanoli Ziaurrehman
Terzopoulos Panagiotis
Turu Gabor
Wan Fangping
Wang Xiaokang
Wang Zhangyang
Wells Carrow I.
Wennerberg Krister
Westerman Bart A.
Willson Timothy M.
Wu Chih-Hsun
Wu Di
Xie Lei
Yun Seongjun
Zeng Jianyang
ÖZGÜR TÜRKMEN ARZUCAN
ÖZKIRIMLI ÖLMEZ ELİF
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2021
Field of study

Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome. The IDG-DREAM Challenge carried out crowdsourced benchmarking of predictive algorithms for kinase inhibitor activities on unpublished data. This study provides a resource to compare emerging algorithms and prioritize new kinase activities to accelerate drug discovery and repurposing efforts

OpenMETU (Middle East Technical University)

Image processing of Ottoman documents

Author: Atalay Mehmet Volkan
Publication venue: 'Middle East Technical University, Faculty of Architecture'
Publication date: 01/01/1990
Field of study

OpenMETU (Middle East Technical University)

Learning by optimization in random neural networks

Author: Atalay Mehmet Volkan
Publication venue
Publication date: 28/10/1998
Field of study

The random neural network model proposed by Gelenbe has a number of interesting features in addition to a well established theory. Gelenbe has also developed a learning algorithm for the recurrent random network model using gradient descent of a quadratic error function. We present a quadratic optimization approach for learning in the random neural network, particularly for image texture reconstruction

OpenMETU (Middle East Technical University)

Implicit motif distribution based hybrid computational kernel for sequence classification

Author: Atalay Mehmet Volkan
Atalay Rengül
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/04/2005
Field of study

Motivation: We designed a general computational kernel for classification problems that require specific motif extraction and search from sequences. Instead of searching for explicit motifs, our approach finds the distribution of implicit motifs and uses as a feature for classification. Implicit motif distribution approach may be used as modus operandi for bioinformatics problems that require specific motif extraction and search, which is otherwise computationally prohibitive

OpenMETU (Middle East Technical University)

Altdizi analizine dayalı genom anlamlandırması

Author: Atalay Mehmet Volkan
Atalay Çetin Rengün
Öztürk Mehmet
Publication venue
Publication date: 30/09/2007
Field of study

OpenMETU (Middle East Technical University)

Integration of ChIP seq and microarray gene expression data

Author: ATALAY MEHMET VOLKAN
ATALAY RENGÜL
IŞIK ZERRİN
Publication venue
Publication date: 17/04/2009
Field of study

Dokuz Eylul University Research Information System

Evaluation of Signaling Cascades Based on the Weights from Microarray and ChIP-seq Data

Author: Atalay Mehmet Volkan
Atalay Rengül
Isik Zerrin
Publication venue
Publication date: 06/09/2009
Field of study

In this study, we combined the ChIP-seq and the transcriptome data and integrated these data into signaling cascades. Integration was realized through a framework based on data- and model-driven hybrid approach. An enrichment model was constructed to evaluate signaling cascades which resulted in specific cellular processes. We used ChIP-seq and microarray data from public databases which were obtained from HeLa cells under oxidative stress having similar experimental setups. Both ChIP-seq and array data were analyzed by percentile ranking for the sake of simultaneous data integration on specific genes. Signaling cascades from KEGG pathway database were subsequently scored by taking sum of the individual scores of the genes involved within the cascade. This scoring information is then transferred to en route of the signaling cascade to form the final score.Signaling cascade model based framework that we describe in this study is a novel approach which calculates scores for the target process of the analyzed signaling cascade, rather than assigning scores to gene product node

OpenMETU (Middle East Technical University)

Dokuz Eylul University Research Information System

Prediction of enzyme classes in a hierarchical approach by using SPMap

Author: Atalay Mehmet Volkan
Atalay Rengül
Yaman A.
Publication venue: 'Elsevier BV'
Publication date: 01/04/2010
Field of study

Enzymes are proteins that play important roles in biochemical reactions as catalysts. They are classified based on the reaction they catalyzed, in a hierarchical scheme by International Enzyme Commission (EC). This hierarchical scheme is expressed in four-level tree structure and a unique number is assigned to each enzyme class. There are six major classes at the top level according to the reaction they carried out and sub-classes at the lower levels are further specific reactions of these classes. The aim of this study was to build a three-level classification model based on the hierarchical structure of EC classes. ENZYME database was used to extract the information of EC classes then enzymes were assigned to these EC classes. Primary sequences of enzymes extracted from UniProtKB/Swiss-Prot database were used to extract features. A subsequence based feature extraction method, Subsequence Profile Map (SPMap) was used in this study. SPMap is a discriminative method that explicitly models the differences between positive and negative examples. SPMap considers the conserved subsequences of protein sequences in the same class. SPMap generates the feature vector of each sample protein as a probability of fixed-length subsequences of this protein with respect to a probabilistic profile matrix calculated by clustering similar subsequences in the training data set. In our case, positive and negative training datasets were prepared for each class, at each level of the tree structure. SPMap was used for feature extraction and Support Vector Machines (SVMs) were used for classification. Five-fold cross validation was used to test the performance of the system. The overall sensitivity, specificity and AUC values for the six major EC classes are 93.08%, 98.95% and 0.993, respectively. The results at the second and third levels were also comparable to those of six major classes

OpenMETU (Middle East Technical University)